新的AUV技术的发展增加了AUV可以应对的任务范围及其运营的长度。结果,AUV能够处理高度复杂的操作。但是,这些任务并不容易适合将任务定义为一系列预先计划的航路点的传统方法,因为不可能事先知道,在任务过程中可能发生的一切。这会导致操作员的期望和实际操作绩效之间存在差距。因此,这可能会在操作员和AUV之间产生降低的信任程度,从而导致不必要的任务中断。为了弥合机器人行为和运营商的期望之间的这一差距,这项工作旨在提供一个框架,以易于理解的方式解释任务期间自动驾驶汽车采取的决策和行动。此外,目的是拥有一个自治性系统,可以在任何自治体系结构之上添加为附加层。为了使该方法适用于配备不同自主权的不同自主系统,这项工作将自主权的内部运作与决策点以及应用知识蒸馏的由此产生的执行动作。最后,为了以更自然的方式向操作员介绍解释,蒸馏决策树的输出与自然语言解释相结合,并将其报告给操作员作为句子。因此,在解释管道的末尾添加了一个称为Concept2Text生成的附加步骤。
translated by 谷歌翻译
深度学习技术的最新进展引发了地面车辆的自主权的根本性进步。定期用于监视,监视和其他常规任务的海洋沿海自动级别的表面车辆(ASV)可以从这种自治中受益。长期的深海运输活动是额外的机会。这两个用例的地形非常不同 - 第一个是沿海水域 - 具有许多障碍,结构和人类的存在,而后者大多没有这样的障碍。环境条件的变化都是两种地形的共同点。绘制此类地形的强大标记数据集对于提高可以推动自主权的情境意识至关重要。但是,只有此类海事数据集有限,这些数据集主要由光学图像组成。虽然,长浪红外(LWIR)是对极端光条件下有助于的光谱的强烈补充,但目前尚不存在带有LWIR图像的标记的公共数据集。在本文中,我们通过在不同条件下呈现在沿海海上环境中捕获的2,900多个LWIR分段图像的标签数据集来填补这一空白。这些图像使用实例分割标记,并分为七个类别 - 天空,水,障碍物,生活障碍,桥梁,自我和背景。我们还评估了三个深度学习体系结构(UNET,PSPNET,DEEPLABV3)的数据集,并对其功效提供了详细的分析。尽管数据集专注于沿海地形,但可以同样有助于深海用例。这种地形的流量将较小,在混乱环境中训练的分类器将能够有效地处理稀疏场景。我们与研究界分享此数据集,希望它刺激新的场景理解海上环境中的能力。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
translated by 谷歌翻译
Automated text analysis has become a widely used tool in political science. In this research, we use a BERT model trained on German party manifestos to identify the individual parties' contribution to the coalition agreement of 2021.
translated by 谷歌翻译